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RECURSIVE ESTIMATION OF TIME-AVERAGE 
VARIANCE CONSTANTS 1 

By Wei Biao Wu 

University of Chicago 

For statistical inference of means of stationary processes, one 
needs to estimate their time-average variance constants (TAVC) or 
long-run variances. For a stationary process, its TAVC is the sum of 
all its covariances and it is a multiple of the spectral density at zero. 
The classical TAVC estimate which is based on batched means does 
not allow recursive updates and the required memory complexity is 
0(n). We propose a faster algorithm which recursively computes the 
TAVC, thus having memory complexity of order O(l) and the com- 
putational complexity scales linearly in n. Under short-range depen- 
dence conditions, we establish moment and almost sure convergence 
of the recursive TAVC estimate. Convergence rates are also obtained. 

1. Introduction. Let be a stationary and ergodic process with 

mean \i = E(Xn) and finite variance; let 7(fc) = cov(Xo,Xfc), k € Z, be the 
covariance function. Given the observations X±, . . . ,X n , a simple estimate 
of [i is the sample mean X n = n _1 ^^ =1 Xj. Under suitable conditions on 
(Xi), X n is asymptotically normal: 

n 

(1) n^ 2 (X n -fi) = n-V2^( X . _ M ) ^ N(0,a 2 ), 

i=i 

where denotes convergence in distribution and a 2 is called the time- 
average variance constant (TAVC), long-run variance or asymptotic variance 
parameter. Goodman and Sokal (1989) called a 2 /7(C)) the integrated auto- 
correlation time. There exists a huge literature on the central limit theory 
for stationary processes. See, for example, Ibragimov and Linnik (1971) and 
Bradley (2007). 
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To conduct statistical inference for fj,, one needs to estimate a 2 . Under 
suitable conditions, a 2 = 2~^fcez7(^)- The estimation of a 2 is an important 
problem in statistical inference of time series and it has a long history. 
Given X±, . . . , X n , let 1 < l n < n be the block length satisfying l n — > oo and 
£n/n— > 0. Based on the batched means 2^i=j™ _1 Xi/l n , 1 < j < n — l n + 1, 
one can estimate <r 2 by 

w *«=djn g (£ g *-*•) • 

The estimate cr 2 (l n ) appears in several contexts and it is closely related to 
Bartlett's spectral density estimate. As an alternative, one can propose a 
similar estimate by using the nonoverlapped batched means Y^i=] n ~ l Xi/l n , 
j = 1, 1 + l n , 1 + 2l n , — Asymptotic properties of cr 2 (l n ) have been exten- 
sively studied; see, for example, Alexopoulos and Goldsman (2004), Song 
and Schmeiser (1995), Btihlmann (2002), Lahiri (2003), Politis, Romano 
and Wolf (1999) and Jones et al. (2006), among others. For other works on 
estimation of a 2 , Chauveau and Diebolt (2003) used multiple parallel chains, 
and Robert (1995) considered Harris recurrent chains. The estimation of a 2 
is related to the problem of Markov chain Monte Carlo (MCMC) conver- 
gence assessment; see Brooks and Roberts (1998), Chauveau and Diebolt 
(1999) and Chauveau, Diebolt and Robert (1998), among others. 

It is well known that X n can be recursively computed in the sense that, 
if a new observation X n+ i is available, then X n+ \ can be computed as 
(nX n + X n+ i)/(n + 1). Hence, the memory complexity for computing X n 
is O(l). However, this nice property is no longer present in the estimate 
(7 2 (l n ) in (2). There is no simple algebraic relation between cr 2 +1 (Z n +i) and 
a 2 (l n ). To compute a 2 +1 (l n+ i), if l n ^ l n +i, one then has to update all 
batched means and the memory complexity is 0(n). In computationally 
intensive problems, it is desirable to have a recursive estimate. For exam- 
ple, in MCMC experiments, one sequentially generates X\,X2, At each 

stage, based on (1), a (1 — a) confidence interval of \i can be constructed 
as X n ± Zi_ a /2cr n /<s/n, where Zi_ a / 2 is the (1 — a/2)th percentile of a stan- 
dard normal distribution, < a < 1. As argued in Geyer (1992), Fishman 
(1996) and Jones et al. (2006), among others, for convergence diagnostics of 
Markov chain Monte Carlo algorithms, one can terminate the simulation by 
choosing n such that the interval is sufficiently small. Quick update of a n is 
essential for efficient sequential monitoring and testing. For example, to test 
the hypothesis (j, = [Xq, we can consider the test statistic y/ri\X n — ^o\/cF n , 
which can be quickly calculated via sequentially updating. 

A common practice in MCMC simulations is to run multiple i.i.d. copies 
of the chain. One can run, for example, 100 copies of the chain and then con- 
duct convergence diagnostics based on comparison of asymptotic variances 
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of each chain. In such cases the computational and memory advantage of 
our recursive algorithm is more appealing. 

The rest of the paper is structured as follows. A sequential estimate a 2 of 
a 2 is introduced in Section 2. Namely, at each stage n, a 2 can be updated 
within O(l) steps so that the computational complexity scales linearly in 
n. The moment and almost sure convergence properties are presented in 
Section 3 and some implementation issues are discussed in Section 4. Section 
5 provides applications to Markov chains and linear processes. Proofs are 
given in the Appendix. 

We now introduce some notation. A random variable £ is said to be in C p 
(p > 0) if \\C\\p := [E(|f | p )] 1/p < oo. Write ||f || = ||f || 2 . For two real sequences 
(a n ) and (b n ), write a n ~ b n if lim^-^oo a n /b n = 1 and a n x b n if there exists 
a constant c > such that 1/c < \a n /b n \ < c for all large n. Let S n = X\ + 
• • • + X n — rifi and 5* = maxj< n \Si\. 

2. Recursive TAVC estimates. For ease of reading, we assume at the 
outset that fi = 0. To define our recursive TAVC estimate, let (a k )k£N be a 
strictly increasing integer- valued sequence such that a\ = 1 and a k+ \ — — > 
oo as k — > oo. Based on (a k )k£N, define another sequence (ij)igN as ti = at 
if afe < i < afc+i- As a simple example, let = k 2 . Then ti = [Vi\ 2 , where 
[u\ = m&x{k £Z:fc<ii} is the integer part of u. Given X±, . . . , X n , define 

n 

(3) V n = Y,W 2 whereWi = X u +X u+1 + --- + Xi, 

i=l 

and 

n 

(4) v n = ^ li where l\ = i — ti + 1. 

i=l 

We propose to estimate the TAVC a 2 by V n /v n . In the estimate (2), for a 
given n, the block size l n is the same for different blocks. In V n , however, 
different blocks have different block lengths. Let = {a^, + 1, . . . , a^+i — 

1}. Assume a k <n< a k+ \. Then t n = a k and W n = X ak +X ak+ i H h X n . If 

n + 1 ^ t n+ \, then n + 1 still belongs to the block B k and VF„+i = W n + X n+ i. 
On the other hand, however, if n + 1 = i n +i, then £ n+ i = a^+i and n + 1 
belongs to the next block -Bi+fc, and M^+i now becomes X n+ \. Combining 
these two cases, we have W n +i = W n l n+ i^ tn+ i + X n+ \. For n € N, choose 
k n G N such that afc n < n < ai +k „- Then afc n = t n . To summarize, we propose 
the following recursive algorithm: 

Algorithm 1. At stage n, we store (n,k n ,a kn ,v n ,V n ,W n ). Note that 
t n = d kn - At stage n + 1, we update the vector by: 
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1. If n + 1 = a 1+kn , let k n+1 = 1 + k n and W n+ i = X n+1 ; If n + 1 ^ a 1+kn , 
then let /c n+ i = k n and W n+ i = W n + X n+1 , 

2. y n+1 = K + ^ n 2 +i, 

3. v n+ i = v n + {n + 2- t n+ i), where t n+1 = a kn+1 . 
Output: a 2 l+l = V n+ i/v n+ i. 

To implement Algorithm 1, one needs to specify the sequence (ak)k>i- A 
simple choice is that = \ck p \ , k > 1, where c> and p > 1 are constants 
(cf. Remark 2 and Theorem 2). We now compute t n for the sequence (afc)fc>i- 
To this end, let k £ N be such that < n < afc+i. Then 

ck p - 1< LcFJ < n < Lc(fc + 1) P J - 1 < c(fc + l) p - 1. 

Solving k = k n from the preceding inequality, we obtain 

tn = ak n , where k n = \(c~ l (n + l)) 1 ^] - 1 and 

(5) 

= min{z 6Z:i>u}. 

With the above formula, it is easy to check the condition n + 1 = t n+ i in 
step 1 of Algorithm 1. In the special case with c = 1 and p = 2, n + 1 = t n+ \ 
if and only if (n + l) 1 / 2 £ N. 

Algorithm 1 is not yet directly applicable in practical situations since \i 
is unknown and W% needs to be centered. A natural centering sequence is 
the sample mean X n = Ya=i Xi/n. Based on V n in (3), we propose 

n 

(6) K = E(^) 2 ' where Wi = X ti +X ti+1 + ---+X i -l i X n , 

i=l 

where we recall k = i ~U + 1. Observe that (W/) 2 - = (hX n ) 2 - 2kWiX n . 
To recursively compute V' w we also need to introduce 

n n 

Un = Yl l i W i and In = H l i ■ 
i=l i=l 

Then 

(7) V; = V n -2U n X n + q n (X n ) 2 . 
Algorithm 1 can be modified as follows: 

Algorithm 2. At stage n, we store (n,k n ,ak n ,v n ,q n ,U n ,V n ,W n ,X n ). 
At stage n + 1, we update the vector by: 

2. X n+l = (nX n + X n+1 )/(n + 1), 

3. q n+ i = q n + (n + 2- t n+ i) 2 , 
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4. v n+ i = v n + (n + 2 - t n+ i), 

5. W n+ \ = X n+ i + W n l n+ i^ tn+l , 

6. F n+1 = K + ^ n 2 +i' 

7. U n+1 = U n + (n + 2-t_ n+1 )W n+1 , 

8. V^ +1 = 14+1 — ^U n+ iX n+ i + g n+ i(X„ +1 ) 2 . 

Output: a 2 n+l = V^ +1 /v n+1 . 

At stage n, based on a\ = V^/v n , we can construct the (1 — a) confi- 
dence interval for fi as X n ±a n Zi_ a / 2 /\/n. Convergence rate of certainly 
depends on the sequence (a^), as well as the dependence structure of the 
underlying process. Section 3 concerns the convergence properties of a\. 

It is easily seen that the above recursive algorithms can be generalized to 
spectral density estimation. Let 

W) = E l(k)e^ lke = ^-£ 7(*0 cos(^), 9 G R, 

be the spectral density function, where y/—l is the imaginary unit. Assume 
that EX/% = 0. As in (3), we can introduce 

n i 

V n (0) = E IWiWI 2 ' where Wi(9) = J2 Xje^ 6 , 

and recursively estimate f(9) at a given € R by / n (#) = V n (9) / \27rv n ) . 
The latter can be viewed as a version of Bartlett's spectral density estimate 
with varying block lengths. Using similar but lengthier arguments adopted 
in the Appendix, we can obtain similar convergence results for f n {9). The 
details are omitted since our primary focus is the inference of sample means 
of stationary processes. 

3. Convergence properties. For the recursive estimate a\ proposed in 
Section 2, a natural question is to study its convergence properties. The 
latter problem is far from being trivial. Here we should implement the de- 
pendence measures proposed in Wu (2005) and obtain moment and almost 
sure convergence of a\. 

We first make some structural assumptions on the dependence. Assume 
hereafter that (Xj) is a stationary causal process of the form 

(8) X i =g(...,e i - 1 ,e i ), 

where are i.i.d. innovations and g is a measurable function such that 
Xi is well defined. The framework (8) is very general and it allows many 
widely used linear and nonlinear processes. As in Wiener (1958) and Priest- 
ley (1988), (8) can be interpreted as a physical system with T% = (. . . , 
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being the input, g being a filter and Xi being the output. Wiener (1958) 
dealt with the problem of representing stationary and ergodic processes as 
shifts of functions of independent random variables; see Rosenblatt (1959), 
Tong (1990) and Borkar (1993). Based on (8), Wu (2005) introduced the 
physical and predictive dependence measures which quantifies the degree 
of dependence of outputs on inputs. Specifically, let e' ,Ej,j £ Z, be i.i.d. 
random variables and = (. . . , £_2, £_i, e' Q ); let gi^o) = E[g(.Fj)| Tq]. For 
p > 1 define the physical dependence measure 

(9) S p (i) = \\Xi -X'iWp where X[ = g{F' Q ,ei, ... ,£j_i,£j), 
and the predictive dependence measure 

(10) u p (i) = \\g i (F )-g i (F' )\\ p . 

The process X[ is a coupled version of Xi with £o replaced by £q. So S p (i) 
quantifies the contribution of £o to Xi by measuring the distance between Xi 
and X[. uj p {i) measures the contribution of £o in predicting future expected 
values. For details, see Wu (2005). 

In comparison with the traditional strong mixing conditions, S p (i) and 
u) p {i) appear more convenient to use in our context and they are directly 
related with the data-generating mechanisms. Wu (2005) showed that, if the 
process (Xi) is stable, namely, SI2 := Z^o^OO < 00, then (13) below holds 
with a < SI2- See also Hannan (1979) and Volny (1993). Box, Jenkins and 
Reinsel (1994) considered the special case of linear processes and interpreted 
the stability condition as the cumulative impact of a single shock £0 on the 
whole process (Xi) being finite. Main results in the sequel are all expressed 
in terms of 5 p (i) and u> p (i). 



3.1. A representation of a. We shall first introduce a useful representa- 
tion of a. Write Si = J2)=i Xj- Assume that EXj = and 

00 

(11) ^||7VQ|| 2 <oo where TV =E(-|-Fi)- E(-|.Fi-i)- 

i=0 

Then 

00 

(12) D k :=J2nXi€C 2 

i=k 

and (-Dfc)fcgz is a stationary martingale difference sequence with respect to 
the natural filter J-^. Additionally, by Theorem 1 in Hannan (1979), we have 
the invariance principle 

(13) -=\ V]li,0<KlU{(7l(t),0<t<l} where a = \\D k \\ 2 . 
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Here B is the standard Brownian motion. Let M n = Ya=i Di- K (11) holds 
with a > 2 [cf. (14) below], then we have \\S n — M n \\ a = o{y/n) [see Theorem 
1 in Wu (2007)]. The operator V% in (14) is called the 'projection operator 
and it naturally generates martingale differences. The representation of a in 
(13) is useful in the analysis of our estimates. 



3.2. Moment convergence. We first present a general result on moment 
convergence properties of V n /v n under mild dependence conditions. Recall 
(11) for the definition of the projection operator V% = E(-|.Fj) — E(-|jFj_ 1 ). 



Theorem 1. Let EX; = and Xj G C a , a > 2. Assume 

oo 

(14) J2\\VoXi\\ a <oo. 
Further assume that, as m — > oo, a m +i — a m — > oo and 

,-. _n (o-m+1 ~ Qm) 2 „ 

Then \\V n /v n - a 2 \\ a / 2 = o(l). 



Theorem 1 implies that, for consistency of V n /v n , does not need to 
have finite fourth moment. Instead, the moment condition Xi G C a with 
a > 2 suffices. We now discuss conditions (14) and (15) in the following 
remarks. 



Remark 1. By Jensen's inequality, we have H^o^Ha < w a (i) < 2||'PoX|| 
see Theorem 1 in Wu (2005). Then (14) is equivalent to the stability condi- 
tion J2'j^=o UJ a(j) < oo [Wu (2005)]. The latter condition can be interpreted as 
follows: the cumulative contribution of £o in predicting future values (X)j>o 
is finite, thus suggesting short-range dependence. For long-range dependent 
processes (14) is violated and a 2 does not always exist; see Example 5.2. So 
(14) is a very natural condition. 

Remark 2. Theorem 1 imposes mild conditions on the sequence {ak)k>i- 
The theorem is applicable if a& = [ck p \ , where p > 1 and c > are constants. 
To account for dependence, it is certainly needed that a m+ \ — a m — > oo. Con- 
dition (15) does not hold if a m diverges to infinity too fast. For example, (15) 
is violated if au = 2 fc . In the latter case V n /v n is not a consistent estimate 
of a 2 if Xi are i.i.d. standard normals. To see this, let j G Z, be indepen- 
dent and identically distributed as Jq B 2 (t) dt, where we recall that B is the 
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standard Brownian motion. Elementary calculations show that V2 m ~ 2 2m /6 
and 

2 k+i_i 

i=2 k 

Since X are i.i.d., V^m/i^™ => (3/2) Sj^o^j/^ • contrast, a = l. 

Corollary 1 asserts the moment convergence of cr 2 = V^/v n generated from 
Algorithm 2 which allows unknown \i. 

Corollary 1. Let conditions (14) and (15) of Theorem 1 be satisfied. 
Then for <5" 2 = V^/v n generated from Algorithm 2, we also have \\V^/v n — 
o- 2 \\ a /2 = o(l). 

3.3. Convergence rates. Theorem 1 asserts the moment convergence of 
V n /v n under mild conditions (14) and (15). However, it does not provide 
information on the convergence rates. Under suitable decay rates of de- 
pendence measures, Theorem 2 provides a convergence rate of V n /v n for 
algebraic sequences (a^). 

Theorem 2. Let at = \ L ck p \, k > 1, where c > and p > 1 are con- 
stants. 

(i) Assume that Xj G C a , EXj = 0, and for some a G (2,4], 

oo 

(16) X>a0')<°°- 

j=0 

T/ien 

(17) ||K - EK|U /2 = o(n 3 / 2 - 3 /( 2 f)+ 2 / Q ). 

(ii) Assume that Xj G £ Q , EXj = and (16) holds for some a > 4. T/ien 
np , r HK-EKII a 2 A 3/M 

U8j n^o „2-3/(2p) ^12p-9 " 

(hi) 1/ Xj G £ 2 , EXj = 0, and for some q G (0, 1] , 

oo 

(19) £iMi)<oo. 

i=o 

T/ienEy n -u n o- 2 = 0[n 1+ ( 1 -'')( 1 - 1 /p)]. Consequently, under (16) and (19), 
\\V n - v n o- 2 \\ a / 2 = 0(n*), where <f> = max(3/2 - 3/(2p) + 2/a, 1 + (1 - - 
VP))- 
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Since u(j) < 62(f) < S a (j), a sufficient condition for (16) and (19) is 

E£i^«(i)<°°- 

Theorem 2 gives guidance on how to choose p based on the dependence 
and moment conditions of the process, which are characterized by param- 
eters q and a, respectively. A good p is the minimizer of n 3 / 2_3 /( 2p ) +2//a + 
n l+(l-g)(l-l/p)_ T/fns p a i so minimizes <j) = 4>(p). Solving the equation 

3/2 - 3/(2p) + 2/a = 1 + (1 - q)(l - 1/p), 
one obtains p = (1/2 + q) /(q — 2 + 2 /a) . To summarize, we have the following: 

Corollary 2. Let p= (1/2 + q)/(q — 1/2 + 2/a). Under conditions of 
Theorem 2, we have \\V n /v n — o- 2 \\ a /2 = 0(n 2 / a ~ 1 / 2 ~ 1 /( 2p )). In particular, if 
a = 4 and q = 1, then p = 3/2 and \\V n /v n — cr 2 1 1 2 = 0(n -1 / 3 ). 

Remark 3. Since — ~ cpkP -1 and m ~ (n/c) 1 ^, elementary cal- 
culations show that 

m 

(20) v am ~ ^(a i+ i - ai)(oi +1 - a* + l)/2 ~ m 2 f- 1 cV/(4p - 2) ~ u n . 
i=l 

By Remark 4, \\V n — Vn\\ a /2/ v n = 0(n _1 / p ). Hence, Corollary 2 also applies 
to <7 2 = V^/u„ since -1/p < 2/a - 1/2 - l/(2p). 

Since 2 < a < 4, p increases as q decreases. The latter observation can 
be explained as follows: if (19) only holds for small q, then it indicates 
strong dependence and one needs to choose large block sizes to suppress the 
dependence. 

We now compare Corollary 2 with classical results of the estimation of 
TAVC by using the batched means. Carlstein (1986) obtained the bound 
0(?i -1 / 3 ) for the special AR(1) model with i.i.d. normal innovations. Un- 
der appropriate strong mixing conditions, one can obtain the optimal mean 
squares error (MSE) bound 0(n~ 2 / 3 ) if the batch size is of order n 1//3 ; see 
Kiinsch (1989) and Lahiri (2003), among others. By Corollary 2, one can ob- 
tain the same bound: ||V^/t> n — ex 2 1 1 § = 0(n~ 2 / 3 ), and the gap a m+ \ — a m = 
[c(m + l) 3/2 \ - Lcm 3 / 2 J ~ (3c/2)m 1 / 2 ~ (c 3 / 2 3/2)n 1 / 3 . For more discussions, 
see Section 4. 

Our results have the attractive feature that they do not require strong 
mixing conditions which may be difficult to be verified in practice. Also, we 
impose a very mild moment condition that X, G C a with 2 < a < 4. 

In view of the recursive nature of our estimate, it is natural to consider its 
almost sure convergence behavior. In the context of mean estimation based 
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on MCMC simulations, Glynn and Whitt (1992) argued that, for asymp- 
totic validity of sequential confidence intervals, one needs to have a strongly 
consistent estimate of a while the weaker version of mere convergence in 
probability is not enough. 

Corollary 3. Under conditions in Corollary 2, we have 



(21) 



max I V n — EV n I 

n<N 



= 0(N T logN), 

a/2 



where r = 3/2 - 3/(2p) + 2/a, 
and Vn — EVat = o[N T (logN) 2 ] almost surely, and also 
(22) Vn/vn ~o- 2 = o(iV 2/Q - 1/2 - 1/(2p) (log iV) 2 ) almost surely. 

4. Implementation issues. Assume that (19) holds with q = 1 and (16) 
holds with a > 4. Let the sequence a& = \ck p \ , k > 1. To implement Algo- 
rithm 2, it is necessary to choose c and p. Corollary 2 suggests the optimal 
p = 3/2. Here we shall suggest a data driven estimate of c by using the 
procedure in Biihlmann and Kiinsch (1999). 

Since (19) holds with q = l, 2~2iZi *ItWI < oo. So as / — > oo, 

oo oo 

E(S 2 ) - la 2 = -2 mm ( Z ' k h( k ) = 9 + where 6 = ~ 2 £ WO- 

k=l k=l 

So EVn-VnO 3 = n0 + o(n). By (20), v n ~ 9m 2 c 2 /16. Since m ~ (n/c) 2/3 , by 
Theorem 2(h), 

2||2 _ ||y n -EK||| + |EK-^a 2 | 2 



,„2 



16cj 4 2566> 2 n 2 / 4 16c 2 / 3 „„ 256 

H ^ — r ~ cr — ^ h6» 2 -7TT \n~ 2/A 



\V n /v n - a 



9m 81c 4 m 4 V 9 81c 4 / 3 
The MSE-optimal c minimizes ||V^/v n — o~ 2 \\ 2 . Hence, 

(23) ||KK-a 2 ||i~^0 2 /V/ 3 n- 2 / 3 and c=^jM. 

We now consider the batched mean estimate <r 2 (Z n ) given in (2) with X n 
therein replaced by 0. Assume l n /n — > and Z n — > oo. Under suitable strong 
mixing conditions, we have ||o" 2 (Z n ) — E<7 2 (/ n )||2 ~ 4<7 4 / n /(3n) and E<7 2 (Z n ) — 
a 2 ~ (9 + o(l))/l n [see, e.g., Song and Schmeiser (1995) or Politis, Romano 
and Wolf (1999)]. So the asymptotic MSE-optimal l n satisfies 

. 2n , 2||2 2 2 / 3 3 1 /302/3 CJ 8/3 
WnV'n) ~ 0- 



n 2/3 

(24) 

39 

with l n = [A*??, 1 / 3 ] and A 3 = —r. 

2o~^ 
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Biihlmann and Kiinsch (1999) proposed a data-driven method for finding 
the block length l n . Sherman (1998) considered a similar problem. For the 
purpose of estimating c in (23), we shall present Biihlmann and Kiinsch's 
(1999) algorithm here. 

Algorithm 3. Let the Tukey-Hanning window wtr(x) = (l + cos(7rx)) x 
1|z|<i/2 and the split-cosine window w§c(%) = (1 + cos(5(x — 0.8)7r))/2 if 

0. 8 < \x\ < 1; wsc(x) = 1 if 0.8 > |ac| and wsc(%) = if |x| > 1. 

1. Calculate j(k) = n _1 E"=i PQ ~ Xn)(X i+ \k\ -X n ), k = 1 - n, . . . ,n - 1. 

2. Let b = n~ 1 . For m = 1,2,3,4, let 

3. Let l n be the closest integer of where 

t -1/3 / 2(E^ 1 1 ->TH(^ 4 n 4 / 21 ) 7 (A;)) 2 y/3 
V3(Efc= 1 1 _ n ^sc(A;M 4 / 21 )|A ; |7(A;))V 

By Theorem 4.1 in Biihlmann and Kiinsch's (1999), under suitable con- 
ditions, one has asymptotically that nb ~ 2cr 4 /(3# 2 ). Relation (23) hence 
suggests a data driven choice c= (4A*/3) 3 / 2 , where A* = In/n 1 ^ 3 and l n is 
from Algorithm 3. By (23) and (24), with c= (4A*/3) 3 / 2 , we have ||V^/u n - 
a2 ||2/||Cn(^n) — cr 2 1 1 2 ~ 4/3, which suggests that the recursive estimate V n /v n 
has a reasonably good performance compared with the batched mean esti- 
mate cr 2 (/ n ). In practice, we can conduct a pilot study and estimate c by 
using Algorithm 3 with a relatively small n. Then we can use this c for our 
recursive algorithm. 

The computational and memory advantage of our recursive algorithm is 
more prominent if one runs multiple copies of the chain. In such applications 
we may obtain an estimate of a 2 for each individual chain, and then use me- 
dian or mean of those estimates to obtain an improved estimate. Also, we 
can check the variations of those TAVC estimates for convergence diagnos- 
tics. The computational cost for the traditional nonrecursive algorithms may 
be very expensive if the number of copies is large. Chauveau and Diebolt 
(2003) also considered estimate of a 2 based on multiple chains. However, 
their estimate is not consistent if the number of copies is bounded. 

5. Applications. Here we shall apply Theorems 1 and 2 to Markov chains 
which are in the form of iterated random functions and to functionals of 
linear processes. The former is useful in MCMC simulations. 
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5.1. Markov chains. Let £j, jgZ, be i.i.d. random variables. Consider 
the Markov chain (Y n ) recursively defined by 

(25) Y n = g(Y n - 1 ,e n ), 

where g is a measurable function. A variety of nonlinear time series models 
are of the form (25). Diaconis and Freedman (1999) showed that the Markov 
chain (25) admits a unique stationary distribution provided that 

foct\ ui t u t \g(y,£o) - g(y',£o)\ 

(26) ElogL eo < where L e , = sup p , 

vtt 12/ — 2/1 

and 

(27) E[L' £o + \g(y ,eo)\''} <oo for some y and i > 0. 

Under (26) and (27), by iterating (25), Y n adopts the representation (8). 
Interestingly, the same set of conditions [namely, (26) and (27)] also implies 
that 5 x (j) = 0(r J ) for some r E (0, 1) and x > 0; see Wu and Shao (2004). 

We now apply Theorem 2 to the process Xi = h{Y.{). In MCMC experi- 
ments, p = EXj is estimated by X n and the length of the confidence interval 
X n ± zi_ a / 2 d' n / \fn can be used for convergence diagnostics [Jones et al. 
(2006)]. We shall impose regularity conditions on h such that (16) and (19) 
are satisfied. Assume Xi E C a ° for some «o > 2. For t > let 

A(i) = sup{||[/i(y) — h(Y')\ x l|y_y/|< t || a : Y and Y' are identically distributed}. 

Following the argument of Theorem 3 in Wu and Shao (2004), under 

(28) / Wl 6 1 dt < oo, 



o 

we have Si^i^a(^) < 00 an d, hence, (16) and (19) hold. The details of 
the derivation are omitted. We now give examples that (28) holds. If h 
is Lipschitz continuous, then A(i) = 0{t) and (28) follows. Let h be an 
indicator function h(y) = l y < yo , where yo is fixed. Then (28) also holds if 
F(\Yi — 2/0 ] < i) = 0(* p ) f° r some p > 0. In particular, if Yi has a density, 
then p = 1 . 

An attractive feature of our setting is that we do not need the assump- 
tion of irreducibility and positive Harris recurrence. The latter assumptions 
are not valid for many Markov chains. For example, Markov chains asso- 
ciated with fractal images [Diaconis and Freedman (1999)] are not gen- 
erally positive Harris recurrent. As a concrete example, consider (25) with 
Y n = (y n _i + 2e n )/3, where s n are i.i.d. with distribution P(e n — 0) — F(£ n — 
1) = 1/2. Then the chain is not positive Harris recurrent. On the other hand, 
(26) and (27) trivially hold and (Y n ) adopts an invariant distribution. Ad- 
ditionally, its support is the Cantor set and P(\Yi — yo\ <t)= 0(t p ), where 
p = (log 2) / (log 3) is the Hausdorff dimension. 
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5.2. Linear processes. Let £j, i £ Z, be i.i.d. random variables with mean 
and finite ath moment (a > 2) and (oj) be a sequence of real coefficients; 
let X n = K(e n ), where K is a measure function for which X n £ £ a and 
e n = Yll^=Q a i e n-i is a linear process. A special case is that K(x) = \x\. Since 
K may be nonlinear, the treatment of Y^i=\K{ e i) appears more difficult 
than that of Y^i=i e i since the latter preserves the linearity structure. 

We now apply Theorem 1 to the process (Aj). Recall that e' Q is indepen- 
dent of Ei, i € Z. Let e' n = e n — a n Eo + a n e' . If K is Lipschitz continuous, then 
\K(e n ) — K(e' n )\ = O(\a n \)\eo — e' \. Hence, the physical dependence measure 
Sa(n) = 0(\a n \) and, consequently, ||"PoAj|| a = 0(|a^ | ) since u a (n) < 5 a (n). 
In this case (14) is reduced to l a *l < 00 > which is a natural condition 

for the short-range dependence. If the latter condition is violated, for ex- 
ample, if cij = i _/3 , 1/2 < (3 < 1, then the (Aj) is a long-memory process 
and normalizing sequence for J2?=i Xi is n 3//2_ ^, which is different from y^. 
Correspondingly, a 2 = oo. 

APPENDIX 

A.l. Proof of Theorem 1. For nSN choose m = m n 6 N such that a m < 
n < a m +i. Then 

n m ai — 1 n 

j=l j=2j=ai_i j=a m 

(29) 



_ (oj ~ Qj-iKoj ~ Qi-i + 1) (n-a m )(n-a m + l) 
2 2 

i=2 

Simple calculations show that (15) implies 

(30) l<liminf < lim sup — = 1. 

So the limits in the above expression are all 1. Also observe that for any 
fixed ko S N, since a m +i — a m is increasing to oo, we have 

(31) lim #{«<^-^ + l<M < lim rnh = Q 

ttwoo Vn m->oo y n 

We now apply the martingale approximation in Wu (2007). Clearly (14) 
implies that D k := EZk^kXi E C a . Let M n = Ef=i A- By Theorem 1 in 
Wu (2007), condition (14) also implies that 

(32) \\S n \\ a = 0(^E), \\M n \\ a = 0(V^) and \\S n — M n \\ a = o(\/n). 
Hence, as n — > oo, 

(33) Pn := n^WSl - M%\\ a/2 < n' 1 ^ - M n \\ a \\S n + M n \\ a 0. 
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As V n in (3), we introduce 

n 

Qn = Y. R i Wllere R i = D U + A+l + •■• + A- 

i=l 

Our plan is to first approximate V n by Q n such that \\Q n — V n \\a/2 = °( v n) 
and then show that \\Q n /v n — (J 2 \\ a /2 = Clearly the theorem follows 
from these two assertions. For the former, let ko £ N. By (33) and (31), 

,. \\V n — Qn\\a/2 , -1V^iid2 tt/2ii 

hmsup '— < hmsupv n > - Wj || a/2 

< limsup-u" 1 VVi - ij + l)pi_ ti +i 

(34) 

<kmsupu n E (i-ti + l)pi- k+ i 

n ^°° l<i<n : i— fi+l>feo 

< sup pk — ► as A;o — > oo. 

fc>fco 

It remains to prove \\Q n /v n — & 2 \\a/2 = Note that ij = if < i < 
afc + i — 1. Let 

n= E (A+A i+ i + --- + A) 2 = E (A*+A»+i + --- + A) 3 

and 

^= E «+<+! + ••• + A 2 )- 

By Burkholder's inequality, there exists a constant c = c a such that 

Qfc+l — i 

||niU /2 < E ll(A fc + A fc+ i + --- + A) 2 IU/2 

i=a k 

= E HA fc +A fc+ i + --- + A|| 2 

flfc+i— 1 

< E c a (i-a fc + l)||A|| 2 . 

j=a fe 

On the other hand, 

a k + l — 1 

liniU/2< E + i)Haii 2 . 

i=a>k 
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In the rest of the proof, c a denotes a constant which only depends on a and 
its value may change from line to line. Since 1 < a/2 < 2 and Y k — K(Y k \ J- ak ), 
k = 1, 2, . . . , is a martingale difference sequence, we have by Burkholder's and 
Jensen's inequalities that 



(35) 

Similarly, 
(36) 



J2[Y k -E(Y k \F ak )} 

k=l 



a/2 
a/2 



< Ca ]T||y fc -E(y fc |^ afc )||^ 



k=l 



<Ca EM" 72 



k=l 



a/2' 



£[y fe -E(F fe |^ a j] 



k=l 



a/2 



a/2 



< c a E H^" fc lla/2- 



fc=l 



Note that Dj are also martingale differences. Simple calculations show that 
E(Y k \f ak ) =E(Y k \F ak ). By (35) and (36), 



E( y *-^) 



k=l 



a/2 



a/2 



<<*E(ii y *ii$ + linil« 



A-l 



<c a |i^iisE 



fe=i 



< c a ||L>i||°max 



*/2J 



'ttfc+l-1 "1 a / 2 

E (« - a fe + 1) 

■ i=ak 
■a h+1 -l 

E (i-ah + l) 

■ i=a.h 



a/2-1 



E 

k=l 



E (i-ojfc + 1) 



By (15) and (30), since a^+i — — > oo, 

(37) — ^CaplUS 



max fe < m (a/j + i - a^) 2 



a/2-1 



0. 



By the ergodic theorem, since D\ G £ Q / 2 , we have \\D\ H h-Df — l& 2 \\ a /2 

o(Z). Therefore, ||Yfe -EY fc || a / 2 = o((a fe+ i -a fc ) 2 ) and, by (35) and (36), 

Hm \\ET=i(Yk-EY k )\\ a/2 = ^ ET=io((ak + i-ak) 2 ) = 



which, in view of (37), implies that ||EfcLi*fc ~ v a m v 2 \\ a /2 = °{ v a m )- 
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Finally, we shall compare Q n and Q am+1 -i = J2T=i ^k- To this end, again 
by (35) and (36), recall a m <n< a m +i, 



\\Qn — Qam+i-llU/2 



<Jm + l — 1 

E 

i=n+l 



< E wml 

a /2 i=n+l 



= E 0(i-ti + l) < (a m+ i -a m ) 2 = o(u n ), 

i=n+l 

which by (34) completes the proof. 

A. 2. Proof of Corollary 1. Observe that V„ remains unchanged if Xi 
is replaced by Xi — fi. So we can assume without loss of generality that 
fj, = 0. By (7) and Theorem 1, it suffices to verify that (i) ||f7n^n|U/2 = °( v n) 
and (ii) \\qn(X n ) 2 \\a/2 = o(v n ). For (ii), by (32), \\X n \\ a = Ofa' 1 ! 2 ). Choose 
m G N such that a m <n< a m +\- By (15), 

T2 



(flm+i - a m ) 2 = o(l) 



fc=2 



o(o^). 



Since a m — > oo and a m is increasing, 



(38) 



max(a i+ i - a;) = o(a m ) = o(n). 

Km 



Hence, q n < f n max;< m (a/ + i — a{) = v n o(n) and (ii) follows. To show (i), we 
claim that 

1/2 



(39) 



|cy a =o(i) 



i=i 



With the above relation, noting that J27=i( a i+i ~ a i) 4 — E£i( a i+i ~~ a z) 2 ] 2 ) 
we have by (29) and (38) that ||E^n^nlU/2 < ||kn||a||^n||» = o(u n ). 

In the sequel we shall prove (39). To this end, recall U = i — U + 1 and let 



hj — hj jTl — ^ li~^-ti<j<ii 
i=l 



j = l,...,n. 



Then 



c4 = E^E^' = E*A- 

»=i j=*» i=i 

Since X,- = ^V-fc^j > an d 'Pj—k^j, j E Z, forming martingale differ- 
ences, we have by Burkholder's and Minkowski's inequalities that 



\U n \\a<J2 
k=0 



^Tj-kXjhj 



RECURSIVE ESTIMATION OF VARIANCES 



oo 



17 



Eh^-^a-l 

k=0 ij=l 

(n \ 1/2 oo 
3=1 / fc=0 

By (14) and the definition of hj, (39) follows from 

n m a-k+i~ 1 m Ofc+i — 1 m 

E^E E ^ 2 <E E (%-^ = D«w-*) 5 



3=1 



fc=l j=a k 



k=l j=a k 



k=l 



Remark 4. If = [c/c^J , k > 1, where c > and p > 1, then m ~ 
(n/c) 1 ^ and, by (39), \\U n \\ a = Ofm^" 4 )/ 2 ] = 0(iv>I 2 ~ 2 Ip). Also note that 
q n xn 3 ~ 2 /f. Hence, ||K-KIU/2 = 0(q n /n)+0(n^ 2 - 2 ^)/n^ 2 = 0(n 2 ~ 2/p ). 

A.3. Proof of Theorem 2. (i) Recall (3) for W t = X u + X ti+1 + ■ ■ ■ + X { 
and (9) for the definition of the coupled process (X' n ). Let W* = X' t . + 
X' t+l + ••• + X\. For notational simplicity write 8j for S a (j). Since e'q is 
independent of e h i£ Z, we have EpQ|.F_i) = Epff^-i) = E(X? |^ ). By 
Jensen's inequality, HPo-Xilla < — A*|| Q = <5, and (16) implies that 6 a = 
E^o \\V Xi\\ a < oo. By Theorem 1 in Wu (2007), ||Wi|| a < c a @ a (i - U + 
l) 1 / 2 , where c a is a constant. Since 

we have by Schwarz's and Jensen's inequalities that 
\\V W?\\ a/2 = \\E[W?\F ] - E^V-^IU/2 
= ||E[W i 2 |^ ]-E[(W i *) 2 |^ ]|U/ 2 

< \\w? - (w*) 2 \\ a/ 2 < m + w:\u\Wi - w*\\ a 

i i 

< 2\\Wi\\ a E Sj < 2c a e a (i -U + 1) 1/2 E 5 r 

Similarly, for k > 0, 

i 

\\Vi-kW?\\ a/2 <2\\W i \\ a y £6 k+ti - j 



(40) 



< 2 Ca e a (i -u + 1) 1/2 E 6 k+u-j- 

3=U 
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Since Vi-kWf, i E Z, form martingale differences, by Burkholder's inequal- 
ity, 

a/2 n 



i=l 



a/2 



i=l 



<c a e^Y, 



By the triangle inequality, since Wf = Y^k=Q , Pi-kW? , we have 



i=l 



a/2 



(41) 



iik-iekiu/ 2 <e 

fc=0 



i=i 



a/2 



If a m <i< a m +i, then ti = a m and i — ti < a m +i — 1 — a m . Let 6 m = [(1 + 
c)p2 p m p ~ 1 J . Elementary calculations show that a m +i — 1 — a m < 6 m for all 
m € N. Hence, 



E 

k=2b n 



(42) 



E^-fcWi s 

i=l 

oo ( n 

< E E 

k=2b m Ki=l 



a/2 



3=0 



a/2\ 2/ a 



O(l) 



< 



E(*-** + l) 



a/4 



i=l 



2 / Q oo 6 m 

E E^ (!) 

fc=26 m j=0 



[OW/ 4 )] 2 / a o(6 m ) = o(n 2 / a 6^ 2 ). 



On the other hand, 

26m- 1 



(43) E 



k=0 



i=l 



0(b r , 



a/2 



E(i-*i + l) 



a/4 



i=l 



2/a 



0(n 2 / a &^ 2 ). 



Therefore, ||K -ET/ n || a/2 = 0(n 2 / a bU 2 ) and (i) follows since b m = C^n 1 ^). 

(ii) Define Gh+i = YllL^ 1 Wf . By Lemma 1 below, we have \\Gh+i - 
¥,(G h+ i\T ah )\\ 2 /{a h+ i - a h ) 4 -> ct 4 /3 as /woo. Since G^+i - E(G? l+ i|^ r ah ) ) 
/i = 1, 2, . . . , are martingale differences with respect to the filter T ah+X i we 
have 

2 m 



J2[G h+ i-E(G h+1 \F ah )} 

h=l 



j2\\G h+ i-nG h+1 \f ah : 



h=l 
m 



h=l 



4 4 4 3/p 

4 -. n 4 ~ 3/p P 
3 12p - 9 
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h=l 



Y,[nG h+ x\Fa h )-HGh + l\Fa h _,)] 
m 

^||[E(G^ +1 |^ a J-E(G ft+1 |^_ 1 )]|| 2 



h=i 



<£||E(G h+1 |^ a J-E(G 



h+lj 



h=l 
m 



= Y / o((a h+1 -a h ) i ) = o(n i -^)- 
h=i 

We now deal with E m := E^LlP^+il^.J - E{G h+1 )}. For a h < i < 
a h+1 - 1, since E(Wf |^ 0fc _J - E(W?) = E^o^-A^Wfl^.J, we have 



i "mil — e 

fc=0 
oo 

= E 

k=0 



m a h+i — 1 

£ x n-fcEcwfi^) 

m a h+i — 1 

X X lin-^wfi^; 



1/2 



Observe that Pi_ fc E(Wf = if i - fc > o h _i, and P i _fcE(W?|.F 0fc _ i ; 

Pi-fcWf if i - A; < o^-i- By (40), as in the proof of (42), we have 



E 

k=2b n 



dh+i— 1 



X X n^v-*E(wfiJVi: 



1/2 



o(nV^3/ 2 ). 



For < k < 2b m - 1, since @ a (l) = £^ £<*(*) as Z -> oo, 
X X ||7V*E(W?pVi)ll 2 



0(1) E E + 

?t=l i=a,h 



E Mi) 

.j=k+U—i 



H— k<a>, 



m a h+i — 1 

(!)E E (^-^ + l)e 2 (a h -a h _!) 

/i=l i=a^ 
m 

h=l 
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Hence, 



2b m — l 

E 

k=0 



Y / o(h 2 P~ 2 )=o(m 2 P- 1 ). 
h=l 



53 E \\v^ k nwf\r ah ^ 

h=l i=a.h 



1/2 



o(6 m m^ 1 / 2 ) 



and (ii) follows in view of 

a m +i— 1 



J2 (W?-E(Wf)) 



Om+1- 1 

< E l|wf|l = o(^) = (&^- 1 / 2 ) 



since \\Wi\\\ = 0(i - U + 1) = <3(6 m ), a m +i - 1 - a m < b m and 6 m = [(1 + 
c)p2PmP- l \. 

(iii) Let j > 0. For isZ, since "Pj are orthogonal and Xj = Y^i^E^iXj, 



ft- 



| 7 (i)| = |E(X X,-)| 

< J2 WPiXoWWPiXjW < 5>HMj ~ *)■ 

Here we let = if i < 0. By (19), 

oo 
j=0 

Consequently, for Si = X\ H h Xi , since < q < 1 , 

oo 

|E5f - Za 2 | < 2^min(j,/)| 7 (i)| = 0{l l ~«). 
i=i 

Therefore, 

|EF n - t„.cr 2 | < 53 |EWi - (i- U + l)cr 2 \ 

i=l 



i=l 



0{nb 1 - q ) = o[ n i+(i-9)(i-i/p)]. 
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Lemma 1. Assume that Xi G C a , EXj = and (16) holds for some a > 
4. Let Si = J2)=iXi. Then we have (i) || £[ =1 (E(S?|^i) - E(S?))|| = o(Z 2 ) 
and (ii) 



(44) 



lim 



El = i(s?- E(g, 
z 4 



Proof. As in (40), for r < 1, ||P r S?|| < Ci 1 / 2 £} =1 5 Q (j - r), where 
C = 2c a Q a . Since £i =1 (E(S 2 |^) -E(S 2 )) = £r=-oc EU^rSf , by orthog- 
onality, (i) follows from 



£(E(S 2 |^)-E(S 2 )) 



j=i 



E 



i=l 

< £ (h\vrs?\\) z 

T— oo \i=l / 

< E (^E^O'-o) 

r=— oo \ j=l / 

< ^ C 2 / 3 9 Q ^5 Q (i-r) 

r=— oo J=l 



J 1 



= 0(/ 3 )E E ^(i-r) = o(/ 4 ). 

j=l r=— oo 

For (ii), let Ai = Yl\=i Sf/l 2 - By the invariance principle (13) and the con- 
tinuous mapping theorem, we have A[ => a 2 J IB{t) 2 dt. By Theorem 1 in 
Wu (2007), \\Si\\ a = 0{Vi)- So 



/ 



\M*/2< 



\s 



Et^<E 



i 



\sa 



i 2 



X 1 



0(1). 



i=l i=l 

Since a/2 > 2, {[Ai - E,(Ai)] 2 , 1 > 1} is uniformly integrable [Chow and Te- 
icher (1988)]. Hence, the weak convergence of A\ implies the C 2 moment 
convergence 

E{[At - E(A t )] 2 } -> ^EjjTVw 2 " E(B(i) 2 )] d*}' = y . 

A.4. Proof of Corollary 3. Choose d € N such that 2 d_1 < iV < 2 d . Using 
the same argument as in the proof of Theorem 2 [see (41)-(43) therein], we 
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have for 1 < a < b that 

\\V b -V a -E(V b -V a )\\ a/2 



J2 {wf-^wf 



i=a+l 



a/2 



0[6 3(l-l/ P )/2 (6 _ a) 2/ C 



where the constant in O does not depend on a and b. To show (21), we shall 
apply a useful maximal inequality established in Wu (2007). By Proposition 
1 in the latter paper, 



max \V n — EV n \ 

n<2 d 



a/2 



Note that 



<E 



r=0 L 1=1 



\\V2rl - V 2 r(l-l) - ^{V 2 rl - ^2''(i-l))|la/2 



-1 2/ a 



Hence, 



1=1 



(l-l))Wa/2 



^0{[(2 r 7) 3 ( 1 - 1 ^/ 2 (2 r ) 2 / Q ] a/2 } 



1=1 



0(l)2 r+3r ( 1_1//p ) a / 4 (2 d ~ r ) 1+3 ( 1 ~ 1 / p ) a//4 



max I V n — ~EV n 

n<2 d 



a/2 



:0(d+ 1)(2 



d\2/a+3(l-l/p)/2 



and (21) follows in view of 2 d - 1 <N<2 d . 

We now show (22). Note that a/2 > 1. By (21), we have 

00 ||max„ <2d \V n -EV n \\\ a/ , 2 00 



which by the Borel-Cantelli lemma implies that Vn — EVjv = o[-/V T (log N)' 

almost surely. Consequently, (22) easily follows from EV n — t n a 2 - 
Q |- n i+(i-,)(i-i/p)]_ n 
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